week 7: multilevel models

multilevel adventures

more than one type of cluster

McElreath doesn’t cover this in his video lecture, but this is from the textbook and worth discussing.

data(chimpanzees, package="rethinking")
d <- chimpanzees
str(d, vec.len=5)
'data.frame':   504 obs. of  8 variables:
 $ actor       : int  1 1 1 1 1 1 1 1 1 1 1 1 ...
 $ recipient   : int  NA NA NA NA NA NA NA NA NA NA NA NA ...
 $ condition   : int  0 0 0 0 0 0 0 0 0 0 0 0 ...
 $ block       : int  1 1 1 1 1 1 2 2 2 2 2 2 ...
 $ trial       : int  2 4 6 8 10 12 14 16 18 20 22 24 ...
 $ prosoc_left : int  0 0 1 0 1 1 1 1 0 0 0 1 ...
 $ chose_prosoc: int  1 0 0 1 1 1 0 0 1 1 0 0 ...
 $ pulled_left : int  0 1 0 0 1 1 0 0 0 0 1 0 ...

divergent transitions

From McElreath:

Recall that HMC simulates the frictionless flow of a particle on a surface. In any given transition, which is just a single flick of the particle, the total energy at the start should be equal to the total energy at the end. That’s how energy in a closed system works. And in a purely mathematical system, the energy is always conserved correctly. It’s just a fact about the physics.

But in a numerical system, it might not be. Sometimes the total energy is not the same at the end as it was at the start. In these cases, the energy is divergent. How can this happen? It tends to happen when the posterior distribution is very steep in some region of parameter space. Steep changes in probability are hard for a discrete physics simulation to follow. When that happens, the algorithm notices by comparing the energy at the start to the energy at the end. When they don’t match, it indicates numerical problems exploring that part of the posterior distribution.

\[\begin{align*} C &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha_{D_{[i]}} \\ \alpha_j &\sim \text{Normal}(\bar{\alpha}, \sigma) \\ \bar{\alpha} &\sim \text{Normal}(0, 1) \\ \sigma &\sim \text{Exponential}(1) \\ \end{align*}\]

data(bangladesh, package="rethinking")
d <- bangladesh

m1 <- brm(
  data=d,
  family=bernoulli,
  use.contraception ~ 1 + (1 | district),
  prior = c( prior(normal(0, 1), class = Intercept), # alpha bar
             prior(exponential(1), class = sd)),       # sigma

  chains=4, cores=4, iter=2000, warmup=1000,
  seed = 1,
  file = here("files/data/generated_data/m71.1"))

\[\begin{align*} C &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha_{D_{[i]}} + \beta_{D_{[i]}}U_i\\ \alpha_j &\sim \text{Normal}(\bar{\alpha}, \sigma) \\ \beta_j &\sim \text{Normal}(\bar{\beta}, \tau) \\ \bar{\alpha}, \bar{\beta} &\sim \text{Normal}(0, 1) \\ \sigma, \tau &\sim \text{Exponential}(1) \\ \end{align*}\]

m2 <- brm(
  data=d,
  family=bernoulli,
  use.contraception ~ 1 + urban + (1 + urban || district),
  prior = c( prior(normal(0, 1), class = Intercept), 
             prior(normal(0, 1), class = b),
             prior(exponential(1), class = sd)),     

  chains=4, cores=4, iter=2000, warmup=1000,
  seed = 1,
  file = here("files/data/generated_data/m71.2"))

Oops, no divergent transitions.

m2
 Family: bernoulli 
  Links: mu = logit 
Formula: use.contraception ~ 1 + urban + (1 + urban || district) 
   Data: d (Number of observations: 1934) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~district (Number of levels: 60) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.48      0.09     0.32     0.67 1.01     1290     2067
sd(urban)         0.55      0.21     0.11     0.96 1.00      860      912

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -0.70      0.09    -0.88    -0.53 1.00     2275     2893
urban         0.63      0.15     0.33     0.92 1.00     2391     2077

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).